Cargese,  Corsica,  July  2004 


On  Fourier  and  Wavelets: 
Representation,  Approximation  and  Compression 

Martin  Vetterli 
EPFL  &  UC  Berkeley 

1.  Introduction  through  History 

2.  Fourier  and  Wavelet  Representations 

3.  Wavelets  and  Approximation  Theory 

4.  Wavelets  and  Compression 

5.  Going  to  Two  Dimensions:  Non-Separable  Constructions 

6.  Beyond  Shift  Invariant  Subspaces:  Finite  Rate  of  Innovation 

7.  Conclusions  and  Outlook 


ISIT04  1 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

07  JAN  2005 

2.  REPORT  TYPE 

N/A 

3.  DATES  COVERED 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

On  Fourier  and  Wavelets:  Representation,  Approximation  and 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

EPFL;  UC  Berkeley 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

See  also  ADM001750,  Wavelets  and  Multifractal  Analysis  (WAMA)  Workshop  held  on  19-31  July  2004., 

The  original  document  contains  color  images. 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

uu 

18.  NUMBER 
OF  PAGES 

84 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Acknowledgements 


Collaborations:  Sponsors:  NSF  Switzerland 

•  T.BIu,  EPFL 

•  M . Do,  UIUC 

•  P.L.Dragotti,  Imperial  College 

•  P.Marziliano,  NIT  Singapore 

•  I.Maravic,  EPFL 

•  R.Shukla,  EPFL 

•  C.Weidmann,  TRC  Vienna 

Discussions  and  Interactions: 

•  A. Cohen,  Paris  VI 

•  I.  Daubechies,  Princeton 

•  R. DeVore,  Carolina 

•  D.  Donoho,  Stanford 

•  M.Gastpar,  Berkeley 

•  V.Goyal,  MIT 

•  J.  Kovacevic,  CMU 

•  S.  Mallat,  Polytech.  &  NYU 

•  M.Unser,  EPFL 


ISIT04  2 


Outline 


1.  Introduction  through  History 

•  From  Rainbows  to  Spectras 

•  Signal  Representations 

•  Approximations 

•  Compression 

2.  Fourier  and  Wavelet  Representations 

3.  Wavelets  and  Approximation  Theory 

4.  Wavelets  and  Compression 

5.  Going  to  Two  Dimensions:  Non-Separable  Constructions 

6.  Beyond  Shift  Invariant  Subspaces 

7.  Conclusions  and  Outlook 


ISIT04  3 


From  Rainbows  to  Spectras 


Von  Freiberg,  1304:  Primary  and  secondary  rainbow 


Newton  and  Goethe 


Signal  Representations 


1807:  Fourier  upsets  the  French  Academy.... 

f(t)  t 


t 

+  /W  +  w 


Fourier  Series:  Harmonic  series,  frequency  changes,  f0,  2f0,  3f 


But...  1898:  Gibbs’  paper  1899:  Gibbs’  correction 


Orthogonality,  convergence,  complexity 


1910:  Alfred  Haar  discovers  the  Haar  wavelet 
“dual”  to  the  Fourier  construction 


f(t)  1 


•  Scale  changes  S0,  2S0,  4S0,  8S0  ... 

•  orthogonality 
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Theorem  1  (Shannon-48,  Whittaker-35,  Nyquist-28,  Gabor-46) 

If  a  function  f(t)  contains  no  frequencies  higher  than  W  cps,  it  is  com¬ 
pletely  determined  by  giving  its  ordinates  at  a  series  of  points 
spaced  1/(2W)  seconds  apart. 

[if  approx.  T  long,  W  wide,  2TW  numbers  specify  the  function] 


It  is  a  representation  theorem: 

•  {si nc(t-n)}n  in  z,  is  an  orthogonal  basis  for  BL[-jt,jt] 

•  f(t)  in  BL[-jt,jt]  can  be  written  as  f(t))  =  V f(n)  •  sinc(t-n) 


Note: 

•  Shannon-BW,  BL  sufficient,  not  necessary. 

•  many  variations,  non-uniform  etc 

•  Kotelnikov-33! 
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Representations,  Bases  and  Frames 


Ingredients: 

•  as  set  of  vectors,  or  ‘’atoms’’,  {cpn} 


•  an  inner  product,  e.g.  (cpn,f)  =  j(cpn-f) 

•  a  series  expansion 


f(t)  = 


n 


Many  possibilities: 

•  orthonormal  bases  (e.g.  Fourier  series,  wavelet  series) 

•  biorthogonal  bases 

•  overcomplete  systems  or  frames 


eo  _  <Po 


UTF 


Note:  no  transforms,  uncountable 
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Approximations,  aproximation... 


The  linear  approximation  method 


Given  an  orthonormal  basis  { gn }  for  a  space  S  and  a  signal 


f  =  2^f’gn)  '8n> 
n 

the  best  linear  approximation  is  given  by  the  projection  onto  a  fixed  sub¬ 
space  of  size  M  (independent  of  f!) 


The  error  (MSE)  is  thus 


fM 


n£J 


M 


(f?  §n)  §n 


8 


M  “ 


2 


f-f 


2  Kf>gn> 

n^JM 


Ex:  Truncated  Fourier  series 

project  onto  first  M  vectors  corresponding  to  largest  expected  inner 
products,  typically  LP 
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The  Karhunen-Loeve  Transform:  The  Linear  View 


Best  Linear  Approximation  in  an  MSE  sense: 

Vector  processes.,  i.i.d.: 

X  =  [X0,  Xj,  XN_  JT  E[Xj]  =  0  E[X  •  XT]  =  Rx 


Consider  linear  approximation  in  a  basis 

M-  1 


Xm  =  2  ^X’  gn>  '  gn 

n  =  0 


M  <  N 


Then: 


E[sm] 


N-  1 


n  =  M 


Karhunen-Loeve  transform  (KLT): 

For  0<M<N,  the  expected  squared  error  is  minimized  for  the  basis  {gn} 
where  gm  are  the  eigenvectors  of  Rx  ordered  in  order  of  decreasing 
eigenvalues. 


Proof:  eigenvector  argument  inductively. 

Note:  Karhunen-47,  Loeve-48,  Hotelling-33,  PCA,  KramerM-56,  TC 
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Geometric  intuition:  Principal  axes  of  distribution: 


Shapes:  ellipsoids 

To  first  approximation,  keep  all  coefficients  above  a  threshold 


This  can  be  used  in  many  settings,  classification,  denoising, 
and  compression  (inverse  waterfilling  thm) 


Compression:  How  many  bits  for  Mona  Lisa? 


<=>  {0,1} 
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A  few  numbers... 


D. Gabor,  September  1959  (Editorial  IRE) 

"...  the  20  bits  per  second  which,  the  psychologists  assure  us,  the 
human  eye  is  capable  of  taking  in,  ..." 

Index  all  pictures  ever  taken  in  the  history  of  mankind 

•  100  years  •  1010  ~  44  bits 

Huffman  code  Mona  Lisa  index 

•  a  few  bits  (Lena  Y/N?,  Mona  Lisa...),  what  about  R(D). . . . 

Search  the  Web! 

•  http://www.google.com,  5-50  billion  images  online,  or  33-36  bits 

JPEG 

•  186K...  There  is  plenty  of  room  at  the  bottom! 

•  JPEG2000  takes  a  few  less,  thanks  to  wavelets... 

Note:  2(256x256x8)  possible  images  (D. Field) 

Homework  in  Cover-Thomas,  Kolmogorov,  MDL,  Occam,  DNA,  etc 

(from  a  contemporary:  0  bits,  I  don’t  care  for  this  modern  stuff) 
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Source  Coding:  some  background 


Exchanging  description  complexity  for  distortion: 

•  rate-distortion  theory  [Shannon:58,  Berger:71] 

•  known  in  few  cases. ..like  i.i.d.  Gaussians  (but  tight:  no  better  way!) 


or  -6dB/bit 


•  typically:  difficult,  simple  models,  high  complexity  (e.g.  VQ) 

•  high  rate  results,  low  rate  often  unknown 
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Limitations  of  the  Standard  Models 


“Splendeurs  et  miseres  de  la  fonction  debit-distortion”  (after  Balsac) 
Precise  results 

•  beautiful  (maybe  too  much  for  its  own  good) 

•  upper  and  lower  bounds 

•  constructive 

Problems 

•  complexity:  exponential  in  code  length 

•  code  construction:  finding  good  codes  is  hard 

Paradox: 

•  Best  codes  used  in  practice  are  suboptimal  (Effros) 

•  transform  codes  dominate  the  scene  of  ‘’real”  compression 

Audio/Image/Video:  distortion  measures? 

So:  unlike  in  lossless  compression,  lossy  compression 
uses  IT  in  a  limited  way;) 


ISIT04  15 


New  image  coding  standard  ...  JPEG  2000 


Old  versus  new  JPEG:  D(R)  on  log  scale 


Main  points: 

•  improvement  by  a  few  dB’s 

•  lot  more  functionalities  (e.g.  progressive  download  on  internet) 

•  at  high  rate  ~  -6db  per  bit:  KLT  behavior 

•  low  rate  behavior:  much  steeper:  NL  approximation  effect? 

•  is  this  the  limit? 
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The  Swiss  Army  Knife  Formula  of  Transform  Coding  [GoyalOO] 


Model:  iid  vector  process  of  size  N,  [x=0,  Rx,  MSE,  high  rate 

•  vector  quantizer,  entropy  code 


x 


Y 


x 


•  transform  and  scalar  quantizers,  entropy  code 


I—  _  _  _  _  _____  _  _  J  L  _  _  _  _  _  _  _  _  _  _  _  J 


D(R)  = 


1 

I2N 


•  tr(T  ‘(T  *)T)  -2 


Trace  min:  ortho;  diff.  entropy  min:  independence 

•  Gaussian  case:  coincide!  but  in  general  not... 
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Representation,  Approximation  and  Compression: 

Why  does  it  matter  anyway? 

Parsimonious  or  sparse  representation  of  visual  information  is  key  in 

•  storage  and  transmission 

•  indexing,  searching,  classification,  watermarking 

•  denoising,  enhancing,  resolution  change 

But:  it  is  also  a  fundamental  question  in 

•  information  theory 

•  signal/image  processing 

•  approximation  theory 

•  vision  research 

Successes  of  wavelets  in  image  processing: 

•  compression  (JPEG2000) 

•  denoising 

•  enhancement 

•  classification 

Thesis:  Wavelet  models  play  an  important  role 
Antithesis:  Wavelets  are  just  another  fad! 
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Interaction  of  topics 


•  AT:  deterministic  setting,  large  classes  of  fcts 

•  HA:  function  classes,  existence,  embeddings 

•  IT:  boundings,  converses,  stochastic  setting 

•  SP:  bases,  algorithms,  complexity 

The  interaction  is  the  fun! 

ISIT04  19 


Outline 


1.  Introduction  through  History 

2.  Fourier  and  Wavelet  Representations 

•  Fourier  and  Local  Fourier  Transforms 

•  Wavelet  Transforms 

•  Piecewise  Smooth  Signal  Representations 

3.  Wavelets  and  Approximation  Theory 

4.  Wavelets  and  Compression 

5.  Going  to  Two  Dimensions:  Non-Separable  Constructions 

6.  Beyond  Shift  Invariant  Subspaces:  Finite  Rate  of  Innovation 

7.  Conclusions  and  Outlook 


ISIT04  20 


2.  Fourier  and  Wavelet  Representations:  Spaces 


i/p 


\  i/p 


Norms:  ||x||p  =  (y|x[n]|p)  ‘  ||f||p  =  J  |f(t)|Fdt 

n  '—00  ^ 

Hilbert  spaces:  12(Z)  =  {x:(||x||2  <  oo)}  L2(R)  =  {f:(||f||2  <  °°)} 

Inner  product:  (x,y)  =  ^x'l[n]y[n]  (f, g)=Jf*(t)g(t)dt 


n 


Orthogonality:  x_Lyo(x,y)  =  0 

Banach  spaces: 


X,  f  s.t.  ||x||p,  ||f||p  <00  p  general 
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A  Tale  of  Two  Representations:  Fourier  versus  Wavelets 

Orthonormal  Series  Expansion 

f  =  2  a nCP"  =  «Pn.<Pm>  =  8n-m  11%  =  Hall2 

n£Z 

Time-Frequency  Analysis  and  Uncertainty  Principle 

f(t)<-»F(co)  A2t  =  Jt2|f(t)|dt  A2co  =  Jo)2|F(co)|doa 

Then 


oa 


not  arbitrarily  sharp 
in  time  and  frequency 
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Local  Fourier  Basis? 


The  Gabor  or  Short-time  Fourier  Transform 


Tin  nW  =  w(t-nT)e 


-jmo)0(t-nT) 


Time-frequency  atoms  localized  at  (nT,mco0) 

frequency 


When  T,  (jo0  “small  enough” 

f(t)~c-Fm  ncpm  n(t)  where  Fm  n  =  <cpm  f> 
Example:  Spectrogram 
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The  Bad  News... 


Balian-Low  Theorem 

cpmn  is  a  short-time  Fourier  frame  with  critical  sampling  (To o0  =  2je) 

then  either 

2  2 

At  =  o°orAco  =  o° 

or:  there  is  no  good  local  orthogonal  Fourier  basis! 

Example  of  a  basis:  block  based  Fourier  series 


r\ 

T\ 

7T 

-T 

0 

\J 

\J 

\J 

T 

2T 

Note:  consequence  of  BL  Thm  on  OFDM,  RIAA 
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The  Good  News! 


There  exist  good  local  cosine  bases. 

jmo)Qt 

Replace  complex  modulation  (e  )  by  appropriate  cosine  modulation 


<Pm,nW  =  w(t-nT)cos(?(m  +  I)(t-nT  +  I)) 


|w(t)|2 
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Another  Good  News! 


Replace  (shift,  modulation) 
by  (shift,  scale) 

or 


\nW  =  2-m/2*P 


\ 

t  -  2 

n 

2 

/ 

n,  m  E  Z 


then  there  exist  “good”  localized  orthonormal  bases,  or  wavelet  bases 

frequency 


w 

w 
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Examples  of  bases 


Haar 


Daubechies,  D2 
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Wavelets  and  representation  of  piecewise  smooth  functions 


Goal:  efficient  representation  of  signals  like: 


where: 

•  Wavelet  act  as  singularity  detectors 

•  Scaling  functions  catch  smooth  parts 

•  “Noise”  is  circularly  symmetric 

Note:  Fourier  gets  all  Gibbs-ed  up! 
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Key  characteristics  of  wavelets  and  scaling  functions 

Wavelets  derived  from  filter  banks,  ortho-LP  with  N  zeroes  at  jt, 
[Daubechies-88], 

G(z)  =  (1  +  z  *)  •  R(z) 

oo  .  i 

Scaling  function:  (j>(co)  =  G^ej(l°/(2  ^ 

i  =  1 

— m/2  — m 

Orthonormal  wavelet  family:  4>m  „(*)  =  2  t|>(2  t-n) 

Scaling  function  and  approximations 

•  Strang-Fix  theorem:  if  c|>(cd)  has  N  zeros  at  multiples  of  2jt  (but  the  or¬ 
igin),  then  {qp(t-n)}nez  spans  polynomials  up  to  degree  N-1 

^cn  •  cp(t -  n)  =  tk  k  =  0,  1,  ...N  -  1 

n 

•  Two  scale  equation: 

<p(t)  =  ^='28n'cP(2t-n) 

vz  n 


•  smoothness:  follows  from  N,  a  =  0,203  N 
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Lowpass  filters  and  scaling  functions  reproduce  polynomials 

•  Iterate  of  Daubechies  L=4  lowpass  filter  reproduces  linear  ramp 


scaling 

function 


0  50  100  150  200  250 


linear 

ramp 


Scaling  functions  catch 


‘’trends99  in  signals 
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Wavelet  approximations 

•  wavelet  xp  has  N  zero  moments,  kills  polynomials  up  to  deg.  N-1 

•  wavelet  of  length  L=  2N-1,  or  2N-1  coeffs  influenced  by  singularity  at 
each  scale,  wavelet  are  singularity  detectors, 


wavelet  coefficients  of  smooth  functions  decays  fast, 
e.g.  f  in  cp,m  <<  0 


OtVn’*)  =  c2 


m|p-2 


Note:  all  this  is  in  1  dimension  only,  2D  is  another  story... 


ISIT04  31 


How  about  singularities? 


If  we  have  a  singularity  of  order  n  at  the  origin 
(0:  Dirac,  1:  Heaviside,...),  the  CWT  transform  behaves  as 

,  ,  H) 

X(a,  0)  =  cn*a 


In  the  orthogonal  wavelet  series:  same  behavior,  but  only  L=2N-1  co¬ 
efficients  influenced  at  each  scale! 

•  e.g.  Dirac/Heaviside:  behavior  as  2“m/2  and  2m/2,  m<<0 

Wavelets  catch  and  characterize  singularities! 
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Thus:  a  piecewise  smooth  signal  expands  as: 


•  lowpass  catches  trends,  polynomials 

•  a  singularity  influences  only  L  wavelets  at  each  scale 

•  wavelet  coefficients  decay  fast 
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More  Spaces 


Cp  spaces:  p-times  diff.  with  bounded  derivatives 

->  Taylor  expansions 

Holder/Lipschitz  a:  locally  a  smooth  (non-integer) 


Sobolev  Spaces  WS(R) 


OC 


2s  i 


fEl  (R)  J  |a)|  |F(oo)|  dco  <  oo 


1  —00 

If  s  >  n  +  -  then  f  is  n-times  continuously  differentiable 
Equivalently  F(oo)  decays  at 


Besov  Spaces  B  (I)  with  respect  to  a  basis  (typically  wavelets) 


fer(l) 


B 


,P=  22l<XIV„,f> 


m  n 


y/p 


<  oo 


or  wavelet  expansion  has  finite  lp  norm 
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From  linear  to  non-linear  approximation  theory 


The  non-linear  approximation  method 

Given  an  orthonormal  basis  { gn }  for  a  space  S  and  a  signal 

f  =  2^f’gn)  'Sn, 


n 


the  best  nonlinear  approximation  is  given  by  the  projection  onto 
adapted  subspace  of  size  M  (dependent  on  f!) 


fM  =  2  <f’  gn>  ’  gn 


T  • 
lM’ 


The  error  (MSE)  is  thus 


neIM 

Kf’gn)|„eiMa 

<f>  gm)| 

thus 

~  2 

M 


set  of  M  largest  (  , ) 


em  - 


f-f 


<f’  g„: 


nfl 


M 


and  em  <  8m. 


Difference:  take  the  first  M  coeffs  (linear)  or 

take  the  largest  M  coeffs  (non-linear) 


Nonlinear  approximation 

•  This  is  a  simple  but  nonlinear  scheme 

•  Clearly,  if  AM(.)  is  the  NL  approximation  scheme: 

Am(x)  +  AM(y)  *  Am(x  +  y) 

This  could  be  called  “adaptive  subspace  fitting” 


From  a  compression  point  of  view,  you  “pay”  for  the  adaptivity 

•  in  general,  this  will  cost 


log 


N 

k 


\ 


bits 


which  cannot  be  spent  on  coefficient  representation  anymore 


LA:  pick  a  subspace  a  priori  NLA  pick  after  seeing  the  data 


ISIT04  37 


Non-Linear  Approximation  Example 


Nonlinear  approximation  power  depends  on  basis 

Example: 

if(t) 

cst  - 


l/sqrt2  1 

Two  different  bases  for  [0,1]: 

i—  ■  •  r  ]2jrkt1 

•  Fourier  series  {eJ 

•  Wavelet  series:  Haar  wavelets 


Linear  approximation  in  Fourier  or  wavelet  bases 

8m  ~  1 

Nonlinear  approximation  in  a  Fourier  basis 

sM~  1/M 


Nonlinear  approximation  in  a  wavelet  basis 

£m~1/2 


t 
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Fourier  versus  Wavelet  bases,  LA  versus  NLA 


N=  1024,  M  =  64 

Fourier  (left):  LA  versus  NLA  does  not  matter 
Wavelets  (right):  NLA  does  orders  of  magnitude  better! 
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Nonlinear  approximation  theory  and  wavelets 


Approximation  results  for  piecewise  smooth  fcts 

•  between  discontinuities, 

behavior  by  Sobolev  or  Besov  regularity 

•  k  derivatives  =>  coeffs  _2m(k-1/2)  when  m«0 

•  Besov  spaces  can  be  defined  with  wavelet  bases.  If 

llf|lG,P  =  (2Kf’gn>|P)1/P<  00  0  <  P  <  2 

then  [DeVoreJL92] : 

em  =  0(M1_2/p) 
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Approximation  in  Sobolev  and  Besov  Spaces 


Nonlinear  Approximation  Error  for  Piecewise  Smooth  Functions 


Linear  Approximation,  Ws[0,N] 

•  Sobolev-s:  uniformly  smooth 

•  Fourier:  gM  =  M_2s_8  S>0 

•  Wavelets:  gM  =  M_2s_8  6>0 


Non-Linear  Approximation 

•  Besov-s:  smooth  between  a  finite  #  of 
discontinuities 


Number  of  retained  coefficients  (N) 


-1 


•  Fourier:  does  not  work,  gM  =  M~ 

•  Wavelets:  approximation  power  given  by  the  smoothness! 

•  Key:  effect  of  discontinuities  limited,  because  wavelets  are 
concentrated  around  discontinuties 

•  f(t)  in  Ws(0,N)  between  finite  #  of  discontinuities, 
then  f(t)  in  Bp(0,N)  (wavelet  of  compact  support) 

•  Then: 


1 

P 


<  s 


result  can  be  refined  to  get  £M  =  M 


-2s -6 


6  >  0 
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4.  Wavelets  and  Compression 


Compression  is  just  one  bit  trickier  than  approximation... 


A  small  but  instructive  example: 

Assume 

•  x[n]  =  a  6[n-k],  signal  is  of  length  N,  k  is  U[0,N-1]  and  a  is  N(0,1). 

•  This  is  a  Gaussian  RV  at  location  k 

A  ~N(0,1) 

A 

k  N 

•  Note:  Rx  =  I! 

Linear  approximation: 

1 

8m  -  M 


Non-linear  approximation,  M  >  0: 


£ 


M 
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Given  budget  R  for  block  of  size  N: 


1.  Linear  approximation  and  KLT:  equal  distribution  of  R/N  bits 


D(R)  =  c-G2 


.  2-2(R/N) 


This  is  the  optimal  linear  approximation  and  compression! 


2.  Rate-distortion  analysis  [Weidmann:99] 


High  rate  case: 

•  Obvious  scheme:  pointer  +  quantizer 


D(R)  =  c-g2 


.  2-2(R-logN) 


•  This  is  the  R(D)  behavior  for  R  »  Log  N 

•  Much  better  than  linear  approximation 

Low  rate  case: 

•  Hamming  case  solved,  inc.  multiple  spikes: 

-  there  is  a  linear  decay  at  low  rates 

•  L2  case:  upper  bounds  that  beat  linear  approx. 
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Example  1:  Binary,  Hamming,  1  and  k  spikes 


Example  2:  Bernoulli-Gaussian 


p=0.1 1 


BernoulliT  0.11  x  Gaussian  Spike 
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Piecewise  smooth  functions:  pieces  are  Lipschitz-a 


The  following  D(R)  behavior  is  reachable  [CohenDGO:02] 

_ O  a  i —  ^  A 

D(R)  =  Cj-R  +C3-  Jr- 2 

There  are  2  modes: 

_ ^  oj, 

•  R  corresponding  to  the  Lipschitz-a  pieces 

•  Jr  •  2~c ^  corresponding  to  the  discontinuities 


Lipschitz-a  pieces:  Linear  Approximation 

The  wavelet  transform  at  scale  j  decays  as  (j  <<  0) 

Wj  =  2j(a  +  1/2) 

Keep  coefficients  up  to  scale  J,  or  choose  a  stepsize  A  for  a  quantizer 

^  ^J(a  +  1/2) 


Therefore,  M  ~  2J  coefficients 
Squared  error: 

-j 

J  .  1/2)  2_2Ja  |yj— 2ct 

j  =  -00 

Rate: 

•  number  of  coefficients  c-M 

Thus 

D(R)~cR“2a 

Just  as  good  as  Fourier  (~R_2ct),  but  local! 
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Rate-distortion  bounds  for  piecewise  polynomial  functions 


D(R)  behavior  of  nonlinear  approximation  with  wavelets 


f(t)  uniformly  distributed 

random  constant 


V 


►t 


Consider  the  simplest  case:  Haar! 

Recall  that 

E  J2~M  c  -2J/2 
fcM  -  z  cj  “  z 

and  consider  describing  the 
significantcoefficients 


Choose  a  stepsize  A  for  a  quantizer. 
Therefore 

•  number  of  scales  J  before  coefs 
set  to  zero  ~log(l/A) 

•  number  of  bits  per  coefficient 
~ log ( 1  /A ) ,  thus  R~J2 

Distortion:  number  of 
scales  times  A  ~J -2 ~J 


Thus 


DW(R)  =  C3-7R-2_  2 
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Rate-distortion  behavior  using  an  oracle 

An  oracle  decides  to  optimally  code  a  piecewise  polynomial 
by  allocating  bits  ‘’where  needed”: 

Consider  the  simplest  case 


Two  approximation  errors 

•  At:  quantization  of  step  location 

•  Aa:  quantization  of  amplitude 

Rate  allocation:  Rt  versus  Ra 
Result: 

Dp(R)  =  Cj  •  2-R/2 
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Piecewise  polynomial,  with  max  degree  N 

A.  Nonlinear  approximation  with  wavelets  having  N+l  zero  moments 

DW(R)  =  C'w  •  (1  +  aJC~R)  ■ 

B.  Oracle-based  method 

-(c  -r) 

Dp(R)  =  C'p-2  p 

Thus 

•  wavelets  are  a  generic  but  suboptimal  scheme 

•  oracle  method  asymptotically  superior  but  dependent  on  the  model 

Conclusion  on  compression  of  piecewise  smooth  functions: 

D(R)  behavior  has  two  modes: 

_ 0  Cl  t ^  A  /n/R 

D(R)  =  Cj  •  R  +  c3  •  JR  •  2 

•  1/polynomial  decay:  cannot  be  (substantially)  improved 

•  exponential  mode:  can  be  improved,  important  at  low  rates 
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Can  we  improve  wavelet  compression?  Footprints! 

Key:  Remove  depencies  accross  scales: 

•  dynamic  programming:  Viterbi-like  algorithm 

•  tree  based  algorithms:  pruning  and  joining 

•  wavelet  footprints:  wavelet  vector  quantization 

Theorem  [DragottiV:03]: 

Consider  a  piecewise  smooth  signal  f(t),  where  pieces  are  Lipschitz- 
a.  There  exists  a  piecewise  polynomial  p(t)  with  pieces  of  maximum 
degree  L«J  such  that  the  residual  ra(t)  =  f(t)-p(t) 

is  uniformly  Lipschitz-a. 

This  is  a  generic  split  into  piecewise  polynomial  and  smooth  residual 


function  f(t) 


+ 


piecewise  polynomial 


Lipschitz-a 
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Footprint  Basis  and  Frames 


Suboptimality  of  wavelets  for  piecewise  polynomials  is  due 
to  independent  coding  of  dependent  wavelet  coefficients 

DW(R) ~ c  •  Jr  ■  2  ^ 


Compression  with  wavelet  footprints 


Theorem:  [Dragotti V:03] 

Given  a  bounded  piecewise  polynomial  of  deg  D  with  K  discontinui¬ 
ties.  Then, a  footprint  based  coder  achieves 


D(R)  =  Cj  -2 


~(c2  •  R) 


This  is  a  computational  effective  method  to  get  oracle  performance 

What  is  more,  the  generic  split  “piecewise  smooth”  into 
“uniformly  smooth  +  piecewise  polynomial”  allows  to  fix 
wavelet  scenarios,  to  obtain 

-2a  -C1 ' R 

D(R)  =  Cj  •  R  +  c2  •  2 

This  can  be  used  for  denoising  and  superresolution 
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Denoising  (use  coherence  across  scale) 


Original  signal 


Hard-Thresholding  (SNR=21.3dB) 


Noisy  Signal  (SNR=15.62dB) 


Cycle-Spinning  (SNR=25.4dB) 


This  is  a  vector  thresholding 
method  adapted  to  wavelet  ~ 
singularities 


Denoising  with  Footprints  (SNR=27.2dB) 
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5.  Going  to  Two  Dimensions:  Non-Separable  Constructions 


Going  to  two  dimensions  requires  non-separable  bases 
Objects  in  two  dimensions  we  are  interested  in 


— 2R 

•  textures:  D(R)  =  Cn*2  per  pixel 

•  smooth  surfaces:  D(R)  =  Cl-2  per  object! 


ISIT04  55 


Models  of  the  world: 


Gauss-Markov 


Piecewise  polynomial 


the  usual  suspect 


Many  proposed  models: 

•  mathematical  difficulties 

•  one  size  fits  all... 

•  Lena  is  not  PC,  but  is  she  BV? 

But:  Fourier,  DCT,  wavelets  use  a 
separable  approach  (line/column...) 

=>  geometry  based  image  processing 
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Recent  work  on  geometric  image  processing 

Long  history:  compression,  vision,  filter  banks 
Current  affairs: 

Signal  adapted  schemes 

•  Bandelets  [LePennec  &  Mallat]:  wavelet  expansions  centered  at 
at  discontinuity  as  well  as  along  smooth  edges 

•  Non-linear  tilings  [Cohen,  Mattei] :  adaptive  segmentation 

•  Tree  structured  approaches  [Shukla  et  al,  Baraniuk  et  al] 

Bases  and  Frames 

•  Wedgelets  [Donoho]:  Basic  element  is  a  wedge 

•  Ridgelets  [Candes,  Donoho]:  Basic  element  is  a  ridge 

•  Curvelets  [Candes,  Donoho] 

Scaling  law:  width  -length2 
L(R2)  set  up 

•  Multidirectional  pyramids  and  contourlets  [Do  et  al] 
Discrete-space  set-up,  l(Z2) 

Tight  frame  with  small  redundancy 
Computational  framework 

This  is  where  the  action  is! 
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Nonseparable  schemes  and  approximation 


Approximation  properties: 

•  wavelets  good  for  point  singularities 

•  ridgelets  good  for  ridges 

•  curvelets  good  for  curves 


Consider  c2  boundary  between  two  csts 


Rate  of  approximation,  M-term  NLA  in  bases,  c2  boundary 

•  Fourier:  0(M'1/2) 

•  Wavelets:  0(M'1) 

•  Curvelets:  0(M'2) 
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Compression  of  non-separable  objects 


Objects  we  know  how  to  compress.... 


element 


Approximation 

•  Wavelets  Em~M_1 

•  Ridgelets  EM~2“M 

Rate/distortion 

— 9R 

•  Oracle  D(R)  =  C-2 

•  Wavelets... .poor 

•  Ridgelets. ...suboptimal 

•  adaptive  schemes:  close  to  oracle 

•  fixed  basis:  under  investigation 
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Tree  Based  Geometric  Compression  [ShuklaDDV:03] 


Idea 

•  tree  and  quadtree  algorithms  popular,  many  pruning  algorithms 
optimality  proofs  for  wedgelets  [Donoho:99] 

•  new  pruning  and  joining  algorithm 


Intuition:  full  tree  dyadic  tree 


pruned  &  joined  tree 


Nj~2J  D(R)  ~  R  1  Nj~J  D(R)~VR'2Ci^  Nj  ~  J°  D(R)~2c2R 

Results:  Rate-distortion  optimal  for  piecewise  polynomials 

-(c2  •  R) 

D(R)  =  Cj  •  2  that  is,  like  an  oracle  method  (up  to  constants) 
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Extension  to  Quadtree: 


Results: 

•  consider  a  piecewise  polynomial  2D  signal,  with  polynomial 
boundaries,  the  following  rate-distortion  behavior  is  achieved 

,  x  -(c4  ■  R) 

D(R)  =  c3  -2 


•  this  is  like  an  oracle  method,  and  >>  than  prune  algorithms 
which  have  a  Jr  penalty 

•  complexity:  polynomial 
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The  prune-join  quadtree  algorithm 

•  polynomial  fit  to  surface  and  to  boundary  on  a  quadtree 

•  rate-distortion  optimal  tree  pruning  and  joining 


quadtree  with  R(D)  pruning  R(D)  Joining  of  "similar”  leaves 


Note:  careful  R(D)  optimization! 
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Geometric  Compression  versus  JPEG2000  at  0.11  bits/pixel,  PSNR: 


30.01 


29.22 


pruned-joined  quadtree 


JPEG2000 
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Behavior  of  tree  algorithms  on  piecewise  smooth  fcts 

ppf:  piecewise  polynomial  functions 
psf:  piecewise  smooth  functions,  a-smooth 


Signal 

Class 

Oracle 

Coder 

Wavelet 

Coder 

Prune  tree 

Coder 

Prune -join 
tree  Coder 

1-D  PPF 

2-cR 

c  i  VR. 

2-c3r 

2-D  PPF 

2~dR 

logR 

R 

2-c4VR 

2~c5r 

1-D  PSF 

R-2a 

R-2a 

^logRj  2a 

^logRj  2a 

2-D  PSF 

R~a 

logR 

R 

err 

at  most  log  penalty  with  polynomial  complexity 
(and  a  bit  more  work  gets  rid  of  logs...) 

Interesting  scaling  laws,  good  behavior  in  practice! 
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Directional  bases  and  contourlets  [M.Do] 

Goal:  find  a  discrete-space  construction  that  has  good  approximation 
properties  for  smooth  functions  with  smooth  boundaries 

•  directional  analysis  as  in  a  Radon  transform 

•  multiresolution  as  in  wavelets  and  pyramids 

•  computationally  easy 

•  bases  or  low  redundancy  frames 

Background: 

•  curvelets  [Candes-Donoho]  indicate  that  ‘’good’’  fixed  bases 
do  exist  for  approximation  of  piecewise  smooth  2D  functions 

1/2 

•  a  frequency-direction  relationship  indicates  a  scaling  law  d~j 


Idea: 

•  directional  analysis:  directions  are  key 

•  multiresolution  analysis 


Result: 

•  one-more-let:  contourlets! 


ISIT04  65 


Directional  Filter  Banks  [BambergerS:92,  DoV:02] 

•  divide  2-D  spectrum  into  slices  with  iterated  tree-structured  f-banks 


■  ■■■i 
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Example  of  directional  basis  functions 

•  64  channels,  elementary  filters  are  Haar  filters 

•  orthonormal  directional  basis 

•  64  equivalent  filters,  the  32  ‘’mostly  horizontal”  ones  are  shown 


iViV,V,V  .V.V.V.V  .W.V.W 


mm  mm  Wm\w.  "" 


This  ressembles  a  ‘’local  Radon  transform”,  or  radonlets! 

•  changes  of  sign  (for  orthonormality) 

•  approximate  lines  (discretizations) 
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Multiresolution  directional  pyramid 


Result: 

•  “tight”  pyramid  and  orthogonal  directional  channels  =>  tight  frame 

•  low  redundancy  <  4/3,  computationally  efficient 
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A  directional  multiresolution  analysis 


Theorem  [Do:01]:  For  a  finite  number  of  directions,  this  generates  a  tight 
frame  for  L2(R2)  with  frame  bound  equal  1 

Method:  Define  embedded  lowpass  directional  spaces  Vj  k 
and  directional  bandpass  spaces  Wj  k 


This  defines  contourlets:  how  do  they  compare  to  wavelets? 


Approximation:  M-term  NLA  satisfies  f-f 


contourlet 


2  1 


ivr 


[CandesD:00] 


possible  with  sine  filters, 

...  open  problem  if  compact  support  contourlets  exist.... 
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Basis  functions:  wavelets  versus  contourlets 


/-I 

Expansion  Example 


Pepper  image  and  its  expansion 


Compression,  denoising,  inverse  problems: 
if  it  is  sparse,  it  is  a  good  start! 
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Example:  denoising  with  contourlets 
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Shift-Invariance  and  Multiresolution  Analysis 

Most  sampling  results  require  shift-invariant  subspaces 

•  f(t)EVof(t-nT)eV  nEZ 

Wavelet  constructions  rely  in  addition  on  scale-invariance 

•  f(t)ev0^f(2mt)ev_m  mez 

Multiresolution  analysis  (Mallat,  Meyer)  gives  a  powerful  framework. 
Yet  it  requires  a  subspace  structure... 

Example:  uniform  or  B-splines 


Question:  can  sampling  be  generalized  beyond  subspaces? 
Note:  Shannon  BW  sufficient,  not  necessary! 
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A  Variation  on  a  Theme  by  Shannon 


Shannon,  BL  case:  f(t)  =  ^  f(nT)sinc(t/T-n)  or  1/T  degrees  of  freedom 
per  unit  of  time  nez 


But:  a  single  discontinuity,  and  no  more  sampling  theorem... 


Q:  Are  there  other  signals  with  finite  number  of  degrees  of  freedom  per 
unit  of  time  that  allow  exact  sampling  results? 

=  >  Finite  rate  of  innovation 

Usual  setup: 


x(t):  signal,  h(t) :sampling  kernel,  y(t) :filtering  of  x(t)  and  yn :samg^s7S 


A  Toy  Example 


K  Diracs  on  the  interval:  2K  degrees  of  freedom.  Periodic  case: 


1  x(t) 

i 

l 

i 

t 

t 

i 

j 

t 

l 

t 

1 

1 

x(t) 


K-l 
n£EZ  k=0 


S(t  -  tk  -  nx) 


j2jrm(t-tk) 


1  6 

m  (EZ 


Key:  The  Fourier  series  is  a  weighted  sum  of  K  exponentials 


K-l 


X[m]  =  -  J  cl 


k=0 


-j2jtmtk 

e  x 


Result:  Taking  2k+1  samples  from  a  lowpass  version  of  BW-(2K+1) 
allows  to  perfectly  recover  x(t) 

Method:  Yule-Walker  system,  annihilating  filter,  Vandermonde  system 
Note:  Relation  to  spectral  estimation  and  ECC  (Berlekamp-Massey) 
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A  Representation  Theorem  [VMB:02] 


For  the  class  of  periodic  FRI  signals  which  includes 

•  sequences  of  Diracs 

•  non-uniform  or  free  knot  splines 

•  piecewise  polynomials 


there  exist  sampling  schemes  with  a  sampling  rate  of  the  order  of  the 
rate  of  innovation  with  perfect  reconstruction  at  polynomial  cost. 


x(t)  h(t)  y(t),yn 


Variations: 

•  finite  length  signals,  local  kernels 

•  Two-dimensions 
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and  the  noisy  case.... 


Use  subspace  methods  (I.Maravic) 

Noisy  nonuniform  spline  and  reconstructed  signal  Noisy  piecewise  linear  signal  and  reconstructed  signal 


Application  example:  UWB  (low  rate  of  innovation. ..but  lots  of  noise!) 

0.025 

0.02 

0.015 

0.01 

0.005 

O 

-0.005 

-0.01 

-0.015 

-0.02 

O  2000  4000  6000  8000  1 0OOO  1 2000 


Transmitted  sequence  of  UWB  pulses  &  Received  signal 
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A  local  algorithm  for  FRI  sampling  [DVB:04] 


The  return  of  Strang-Fix! 


local,  polynomial  complexity  reconstruction,  for  diracs  and  piecewise 
polynomials 
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Conclusions 


Wavelets  and  the  French  revolution 

•  too  early  to  say? 

•  from  smooth  to  piecewise  smooth  functions 

Sparsity  and  the  Art  of  Motorcycle  Maintenance 

•  sparsity  as  a  key  feature  with  many  applications 

•  denoising,  inverse  problems,  compression 

LA  versus  NLA: 

•  approximation  rates  can  be  vastly  different! 

To  first  order,  operational,  high  rate,  D(R) 

•  improvements  still  possible 

•  low  rate  analysis  difficult 


Two-dimensions: 

•  really  harder!  and  none  used  in  JPEG2000... 

•  approximation  starts  to  be  understood,  compression  mostly  open 

•  contourlet  leads  to  efficient  algorithms 


Beyond  subspaces: 

•  FRI  results  on  sampling,  many  open  questions! 
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Outlook 


Do  we  understand  image  representation/compression  better? 

•  high  rate,  high  resolution:  there  is  promise 

•  low  rate:  room  at  the  bottom? 

New  images 

•  plenoptic  functions  (set  of  all  possible  images) 


•  non  BL  images  (FRI?) 

•  manifolds,  structure  of  natural  images 

Distributed  images 

•  interactive  approximation/compression 

•  SW,  WZ,  DKLT... 
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Why  Image  Representation  Remains  a  Fascinating  Topic... 
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